.\" Hey, EMACS: -*- nroff -*-
.\" First parameter, NAME, should be all caps
.\" Second parameter, SECTION, should be 1-8, maybe w/ subsection
.\" other parameters are allowed: see man(7), man(1)
.\" Please adjust this date whenever revising the manpage.
.\" 
.\" Some roff macros, for reference:
.\" .nh        disable hyphenation
.\" .hy        enable hyphenation
.\" .ad l      left justify
.\" .ad b      justify to both left and right margins
.\" .nf        disable filling
.\" .fi        enable filling
.\" .br        insert line break
.\" .sp <n>    insert n+1 empty lines
.\" for manpage-specific macros, see man(7)
.TH "DMTCP" "1" "June 17, 2008" "DMTCP team" "Distributed MultiThreaded CheckPointing"
.SH "NAME"
dmtcp \- Distributed MultiThreaded Checkpointing
.SH "SYNOPSIS"
.B dmtcp_coordinator
.RI [port]
.br 

.B dmtcp_launch 
.RI command
.RI [args...]
.br 

.B dmtcp_restart
.RI ckpt_FILE1.dmtcp
.RI [ckpt_FILE2.dmtcp...]

.B dmtcp_command
.RI coordinatorCommand

.SH "DESCRIPTION"
\fBDMTCP\fP is a tool to transparently checkpointing the state of an arbitrary
group of programs spread across many machines and connected by sockets. It
does not modify the user's program nor the operating system.
\fBMTCP\fP is a standalone component of DMTCP available as a checkpointing
library for a single process.
.SH "OPTIONS"
For each command, the \-\-help or \-h flag will show the command-line options.
Most command line options can also be controlled through environment variables.
These can be set in bash with "export NAME=value"
or in tcsh with "setenv NAME value".

.IP  DMTCP_CHECKPOINT_INTERVAL=integer
Time in seconds between automatic checkpoints.  Checkpoints can also be
initiated manually by typing 'c' into the coordinator. (default: 0, disabled;
dmtcp_coordinator only)

.IP  DMTCP_HOST=string
Hostname where the cluster\-wide coordinator is running. (default: localhost;
dmtcp_launch, dmtcp_restart only)

.IP  DMTCP_PORT=integer
The port the cluster\-wide coordinator listens on. (default: 7779)

.IP  DMTCP_GZIP=(1|0)
Set to "0" to disable compression of checkpoint images.
(default: 1, compression enabled; dmtcp_launch only)
WARNING:  gzip adds seconds.  Without gzip, ckpt/restart is often less than 1 s

.IP  DMTCP_CHECKPOINT_DIR=path
Directory to store checkpoint images in. (default: ./)

.IP  DMTCP_SIGCKPT=integer
Internal signal number to use for checkpointing.  Must not be used by the
user program.
(default: SIGUSR2; dmtcp_launch only)
.SH "DMTCP_COORDINATOR"
Each computation to be checkpointed must include a DMTCP coordinator
process.
One can explicitly start a coordinator through dmtcp_coordinator,
or allow one to be started implicitly in background by either dmtcp_launch
or dmtcp_restart to operate.
The address of the unique coordinator should be specified by
dmtcp_launch, dmtcp_restart, and dmtcp_command either through
the \-\-host and \-\-port command-line flags or through the
the DMTCP_HOST and DMTCP_PORT environment variables.  If neither is
given, the host-port pair defaults to localhost-7779.
The host-port pair associated with a particular coordinator is
given by the command-line flags used in the dmtcp_coordinator command,
or the environment variables then in effect, or the default of localhost-7779.

The coordinator is stateless and is \fBnot\fR checkpointed.
On restart, one can use an existing or a new coordinator.
Multiple computations under DMTCP control can coexist by providing
a unique coordinator (with a unique host-port pair) for each such
computation.

The coordinator initiates a checkpoint for all processes in its computation
group.  Checkpoints can be:  performed automatically on an interval (see
DMTCP_CHECKPOINT_INTERVAL above); or initiated manually on the
standard input of the coordinator (see next paragraph);
or initiated directly under program control by the computation through
the plugin API (see below).

The coordinator accepts the following commands on its standard input.
Each command should be followed by the <return> key.  The commands are: 
.br 
  l : List connected nodes
.br 
  s : Print status message
.br 
  c : Checkpoint all nodes
.br 
  f : Force a restart even if there are missing nodes (debugging)
.br 
  k : Kill all nodes
.br 
  q : Kill all nodes and quit
.br 
  ? : Show this message

Coordinator commands can also be issued remotely using \fBdmtcp_command\fR.

.SH "EXAMPLE USAGE"
.TP  
1. In a separate terminal window, start the dmtcp_coordinator.
(See previous section.)

 dmtcp_coordinator

.TP 
2. In separate terminal(s), replace each command(s) with "dmtcp_launch
[command]".  The checkpointed program will connect to the coordinator
specified by DMTCP_HOST and DMTCP_PORT.  New threads will be
checkpointed as part of the process.  Child processes will
automatically be checkpointed.  Remote processes started via \fIssh\fR
will automatically checkpointed. (Internally, DMTCP modifies the
\fIssh\fR command line to call dmtcp_launch on the remote host.)

 dmtcp_launch ./myprogram

.TP 
3. To manually initiate a checkpoint, either run the command below
or type "c" followed by <return> into the coordinator.  Checkpoint
files for each process will be written to DMTCP_CHECKPOINT_DIR. The
dmtcp_coordinator will write "dmtcp_restart_script.sh" to its working
directory.  This script contains the necessary calls to dmtcp_restart
to restart the entire computation, including remote processes created via
\fIssh\fR.

     dmtcp_command \-c
.br 
OR:  dmtcp_command \-\-checkpoint

.TP 
4. To restart, one should execute dmtcp_restart_script.sh, which is
created by the dmtcp_coordinator in its working directory at the time
of checkpoint. One can optionally edit this script to migrate
processes to different hosts.  By default, only one restarted process
will be restarted in the foreground and receive the standard input.
The script may be edited to choose which process will be restarted in
the foreground.

 ./dmtcp_restart_script.sh

.SH "DMTCP PLUGIN"

The source distribution includes a top-level \fIplugin\fR directory,
with examples of how to write a plugin for DMTCP.  Plugins allow
a checkpointed application to disconnect from an external resource,
and then reconnect to it at the time of restart.  Further
examples are in the \fItest/plugin\fR directory.  In particular,
\fItest/plugin/applic-initiated\fR demonstrates how an application
can request a checkpoint under program control.

The plugin feature adds three new user-programmable capabilities.
A plugin may: add wrappers around system calls; take special actions
at during certain events (e.g. pre-checkpoint, resume/post-checkpoint,
restart); and may insert key-value pairs into a database at restart
time that is then available to be queried by the restarted processes of
a computation.
(The events available to the plugin feature form a superset
of the events available with the dmtcpaware interface.)
One or more plugins are invoked via a list of colon-separated
absolute pathnames.

  dmtcp_launch \-\-with\-plugin PLUGIN1[:PLUGIN2]...

.SH "DMTCPAWARE API"
This API is now deprecated in favor of DMTCP plugins (see above).
DMTCP provides a programming interface to allow checkpointed applications
to interact with dmtcp.  In the source distribution, see
dmtcpaware/dmtcpaware.h for the functions available.
See test/dmtcpaware[123].c for three example applications.
For an example of its usage, try:

 cd test; rm dmtcpaware1; make dmtcpaware1; ./autotest \-v dmtcpaware1

The user application should link with libdmtcpaware.so (\-ldmtcpaware)
and use the header file dmtcp/dmtcpaware.h.

.SH "RETURN CODE"
A target program under DMTCP control normally returns the same return code
as if executed without DMTCP.  However, if DMTCP fails (as opposed to the
target program failing), DMTCP returns a DMTCP-specific return code,
rc (or rc+1, rc+2 for two special cases), where rc is the integer value
of the environment variable DMTCP_FAIL_RC if set, or else the default
value, 99.
.SH "SEE ALSO"
Full documentation is available at the DMTCP home page:
 http://dmtcp.sourceforge.net/
.br
FAQ: http://dmtcp.sourceforge.net/FAQ.html
.br
Plugins: https://github.com/dmtcp/dmtcp/blob/master/doc/plugin-tutorial.pdf
	(DMTCP-2.x)
.br
  examples: https://github.com/dmtcp/dmtcp/tree/master/test/plugin
.SH "AUTHORS"
DMTCP and its standalone single\-process component MTCP (MultiThreaded
CheckPointing) were created by the long-term contributors, Jason Ansel,
Kapil Arya, Gene Cooperman, Rohan Garg, Artem Y. Polyakov, Mike Rieker,
and a series of additional contributors including Alex Brick, Tyler
Denniston, William Enright, Gregory Kerr, Ana-Maria Visan, and others.
For support, see DMTCP home page.
